home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Suzy B Software 2
/
Suzy B Software CD-ROM 2 (1994).iso
/
mintprgs
/
mintupgr
/
disk7.zoo
/
elvis.zoo
/
usr
/
doc
/
elvis
/
regexp.doc
< prev
next >
Wrap
Text File
|
1992-08-12
|
8KB
|
265 lines
_4. _R_E_G_U_L_A_R _E_X_P_R_E_S_S_I_O_N_S
Elvis uses regular expressions for searching and sub-
stututions. A regular expression is a text string in which
some characters have special meanings. This is much more
powerful than simple text matching.
_S_y_n_t_a_x
Elvis' regexp package treats the following one- or
two-character strings (called meta-characters) in special
ways:
\(_s_u_b_e_x_p_r_e_s_s_i_o_n\)
The \( and \) metacharacters are used to delimit
subexpressions. When the regular expression matches
a particular chunk of text, Elvis will remember
which portion of that chunk matched the _s_u_b_e_x_p_r_e_s_-
_s_i_o_n. The :s/regexp/newtext/ command makes use of
this feature.
^ The ^ metacharacter matches the beginning of a line.
If, for example, you wanted to find "foo" at the
beginning of a line, you would use a regular expres-
sion such as /^foo/. Note that ^ is only a meta-
character if it occurs at the beginning of a regular
expression; anyplace else, it is treated as a normal
character.
$ The $ metacharacter matches the end of a line. It
is only a metacharacter when it occurs at the end of
a regular expression; elsewhere, it is treated as a
normal character. For example, the regular expres-
sion /$$/ will search for a dollar sign at the end
of a line.
\< The \< metacharacter matches a zero-length string at
the beginning of a word. A word is considered to be
a string of 1 or more letters and digits. A word
can begin at the beginning of a line or after 1 or
more non-alphanumeric characters.
\> The \> metacharacter matches a zero-length string at
the end of a word. A word can end at the end of the
line or before 1 or more non-alphanumeric charac-
ters. For example, /\<end\>/ would find any
instance of the word "end", but would ignore any
instances of e-n-d inside another word such as
"calendar".
. The . metacharacter matches any single character.
[_c_h_a_r_a_c_t_e_r-_l_i_s_t]
August 10, 1992
4-2 REGULAR EXPRESSIONS 4-2
This matches any single character from the
_c_h_a_r_a_c_t_e_r-_l_i_s_t. Inside the _c_h_a_r_a_c_t_e_r-_l_i_s_t, you can
denote a span of characters by writing only the
first and last characters, with a hyphen between
them. If the _c_h_a_r_a_c_t_e_r-_l_i_s_t is preceded by a ^
character, then the list is inverted -- it will
match character that _i_s_n'_t mentioned in the list.
For example, /[a-zA-Z]/ matches any letter, and /[^
]/ matches anything other than a blank.
\{_n\} This is a closure operator, which means that it can
only be placed after something that matches a single
character. It controls the number of times that the
single-character expression should be repeated.
The \{_n\} operator, in particular, means that the
preceding expression should be repeated exactly _n
times. For example, /^-\{80\}$/ matches a line of
eighty hyphens, and /\<[a-zA-Z]\{4\}\>/ matches any
four-letter word.
\{_n,_m\} This is a closure operator which means that the
preceding single-character expression should be
repeated between _n and _m times, inclusive. If the _m
is omitted (but the comma is present) then _m is
taken to be inifinity. For example, /"[^"]\{3,5\}"/
matches any pair of quotes which contains three,
four, or five non-quote characters.
* The * metacharacter is a closure operator which
means that the preceding single-character expression
can be repeated zero or more times. It is
equivelent to \{0,\}. For example, /.*/ matches a
whole line.
\+ The \+ metacharacter is a closure operator which
means that the preceding single-character expression
can be repeated one or more times. It is equivelent
to \{1,\}. For example, /.\+/ matches a whole line,
but only if the line contains at least one charac-
ter. It doesn't match empty lines.
\? The \? metacharacter is a closure operator which
indicates that the preceding single-character
expression is optional -- that is, that it can occur
0 or 1 times. It is equivelent to \{0,1\}. For
example, /no[ -]\?one/ matches "no one", "no-one",
or "noone".
Anything else is treated as a normal character which
must exactly match a character from the scanned text. The
special strings may all be preceded by a backslash to force
them to be treated normally.
August 10, 1992
4-3 REGULAR EXPRESSIONS 4-3
_S_u_b_s_t_i_t_u_t_i_o_n_s
The :s command has at least two arguments: a regular
expression, and a substitution string. The text that
matched the regular expression is replaced by text which is
derived from the substitution string.
Most characters in the substitution string are copied
into the text literally but a few have special meaning:
& Insert a copy of the original text
~ Insert a copy of the previous replacement text
\1 Insert a copy of that portion of the original text which
matched the first set of \( \) parentheses
\2-\9 Do the same for the second (etc.) pair of \( \)
\U Convert all chars of any later & or \# to uppercase
\L Convert all chars of any later & or \# to lowercase
\E End the effect of \U or \L
\u Convert the first char of the next & or \# to uppercase
\l Convert the first char of the next & or \# to lowercase
These may be preceded by a backslash to force them to
be treated normally. If "nomagic" mode is in effect, then &
and ~ will be treated normally, and you must write them as
\& and \~ for them to have special meaning.
_O_p_t_i_o_n_s
Elvis has two options which affect the way regular
expressions are used. These options may be examined or set
via the :set command.
The first option is called "[no]magic". This is a
boolean option, and it is "magic" (TRUE) by default. While
in magic mode, all of the meta-characters behave as
described above. In nomagic mode, only ^ and $ retain their
special meaning.
The second option is called "[no]ignorecase". This is
a boolean option, and it is "noignorecase" (FALSE) by
default. While in ignorecase mode, the searching mechanism
will not distinguish between an uppercase letter and its
lowercase form. In noignorecase mode, uppercase and lower-
case are treated as being different.
Also, the "[no]wrapscan" option affects searches.
_E_x_a_m_p_l_e_s
This example changes every occurence of "utilize" to
"use":
:%s/utilize/use/g
August 10, 1992
4-4 REGULAR EXPRESSIONS 4-4
This example deletes all whitespace that occurs at the
end of a line anywhere in the file. (The brackets contain a
single space and a single tab.):
:%s/[ ]\+$//
This example converts the current line to uppercase:
:s/.*/\U&/
This example underlines each letter in the current
line, by changing it into an "underscore backspace letter"
sequence. (The ^H is entered as "control-V backspace".):
:s/[a-zA-Z]/_^H&/g
This example locates the last colon in a line, and
swaps the text before the colon with the text after the
colon. The first \( \) pair is used to delimit the stuff
before the colon, and the second pair delimit the stuff
after. In the substitution text, \1 and \2 are given in
reverse order to perform the swap:
:s/\(.*\):\(.*\)/\2:\1/
August 10, 1992